<!DOCTYPE html>
<html>
<head>

<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

<link rel="stylesheet" href="http://netdna.bootstrapcdn.com/bootstrap/3.0.3/css/bootstrap.min.css">
<link rel="stylesheet" href="http://netdna.bootstrapcdn.com/bootstrap/3.0.3/css/bootstrap-responsive.min.css">

<style type="text/css">
.nav { }
.nav li { float: left; width: 110px; }
.container { background-color: #ffffff; padding: 30px; }
.content { padding: 30px; }
body, h1, h2, h3, h4 { font-family: "Trebuchet MS","Helvetica Neue",Arial,Helvetica,sans-serif,"Georgia";}
body { background-color: #eeeeee; }
header { width: 100%; }
blockquote p { font-size: 14px; }
</style>
<title>ZPar | Dependency Parsing</title>
</head>

<body>
<div class="table-bordered container">
<div class="content">

<header>
<div class="page-header text-center">
<h1>Dependency Parsing</h1>
</div>
</header>


<h2><a id="How-to-compile">How to compile</a></h2>
<p>
Suppose that ZPar has been downloaded to the directory <code>zpar</code>. To make a dependency parsing system for English, type <code>make english.depparser</code>. This will create a directory <code>zpar/dist/english.depparser</code>, in which there are two files: <code>train</code> and <code>depparser</code>. The file <code>train</code> is used to train a parsing model, and the file <code>depparser</code> is used to parse new texts using a trained parsing model. Similarly, we can make a dependency parsing system for Chinese by typing <code>make chinese.depparser</code>. The <code>train</code> and <code>depparser</code> files are created under the directory of <code>zpar/dist/chinese.depparser</code>.
</p>

<h2><a id="Format-of-inputs-and-outputs">Format of inputs and outputs</a></h2>
<p>
The input file to the <code>train</code> executable contains a set of parse trees. An example parse tree is as follows:
</p>

<pre><code>
Ms.        NNP    1
Haag       NNP    2
plays      VBZ    -1
Elianti    NNP    2
.          .      2
</code></pre>

<p>
Here the first column represents the words of the sentence; the second column contains POS tags of the words; the third column represents the indices of the heads of the words. Indices start from 0. For example, the head index of the word <code>Ms.</code> is 1, which means its head is <code>Haag</code>. The head index for the root word of the sentences is -1. Note that, in each line tab characters are used to separate a word, a POS tag, and an index.
</p>
<p>
The input file to the <code>depparser</code> executable contains POS tagged sentences. The formats for English and Chinese are different.
</p>
 <strong>English</strong>: 
<br/>
<pre><code> Ms./NNP Haag/NNP plays/VBZ Elianti/NNP</code></pre>
<strong>Chinese</strong>: 
<br/>
<pre><code> ZPar_NR 可以_MD 分析_VV 中文_NN 和_CC 英文_NN </code></pre>

<p>
For Chinese, inputs to both <code>train</code> and <code>depparser</code> must be encoded in <code>utf8</code>.
</p>

<h2><a id="How-to-train-a-model">How to train a model</a></h2>
<p>
To train a model, use
</p>
<pre><code> zpar/dist/english.depparser/train &lt;train-file&gt; &lt;model-file&gt; &lt;number of iterations&gt; </code></pre>
<p>
For example, using the English <a href="eng_dep_files/train.txt">example train file</a>, you can train a  model by
</p>
<pre><code> zpar/dist/english.depparser/train train.txt model 1 </code></pre>
<p>
After training is completed, a new file <code>model</code> will be created in the current directory, which can be used to parse POS-tagged sentences. The above command performs training with one iteration (see <a href="#tuning">How to tune the performance of a system</a>) using the training file.
</p>
<p>
The commands for training Chinese parsing models are the same. For example, using the Chinese <a href="chn_dep_files/train.txt">example train file</a>, you can train a model by
</p>
<pre><code> zpar/dist/chinese.depparser/train train.txt model 1 </code></pre>
<h2><a id="How-to-parse-new-texts">How to parse new texts</a></h2>
<p>
To apply an existing model to parse new texts, use
</p>
<pre><code> zpar/dist/english.depparser/depparser &lt;model&gt; &lt;input-file&gt; &lt;output-file&gt;</code></pre>

<p>
For example, using the model we just trained, we can parse <a href="eng_dep_files/input.txt">an example input</a> by
</p>
<pre><code> zpar/dist/english.depparser/depparser model input.txt output.txt</code></pre>

<p>
The output file contains automatically parsed trees. The commands for parsing Chinese texts are the same. See an example of <a href="chn_dep_files/input.txt">Chinese input file</a>.
</p>
<h2><a id="Outputs-and-evaluation">Outputs and evaluation</a></h2>
<p>
In order to evaluate the quality of the outputs, we can manually specify the gold parse trees of a sample, and compare the outputs with the correct sample.
</p>
<p>
Manually specified parse trees of the input file are given in <a href="eng_dep_files/reference.txt">this example reference file</a> (find a Chinese reference file <a href="chn_dep_files/reference.txt">here</a>).  Here is a <a href="eng_dep_files/evaluate.py">Python script</a> that performs automatic evaluation.
</p>
<p>
Using the above <code>output.txt</code> and <code>reference.txt</code>, we can evaluate the accuracies by typing
</p>
<pre><code> python evaluate.py output.txt reference.txt</code></pre>
<p>
You can find the precision, recall, and f-score here. See the explanation of these measures on <a href="http://en.wikipedia.org/wiki/Precision_and_recall">Wikipedia</a>.
</p>
<h2><a id="tuning">How to tune the performance of a system</a></h2>
<p>
The performance of the system after one training iteration may not be optimal. You can try training a model for another few iterations, after each you compare the performance. You can choose the model that gives the highest f-score on your test data. We conventionally call this test file the development test data, because you develop a parsing model using this. Here is a <a href="eng_dep_files/test.sh">a shell script</a> that automatically trains the parser for 30 iterations, and after the ith iteration, stores the model file to model.i. You can compare the f-score of all 30 iterations and choose model.k, which gives the best f-score, as the final model. In this file, this is a variable called <code>zpar</code>. You need to set this variable to the relative directory of <code>zpar/dist/english.depparser</code> or <code>zpar/dist/chinese.depparser</code>.
</p>
<h2><a id="Source-code">Source code</a></h2>
<p>
The source code for the English dependency parser can be found at
</p>
<pre><code> zpar/src/common/depparser/implementation/ENGLISH_DEPPARSER_IMPL</code></pre>

<p>
where <code>ENGLISH_DEPPARSER_IMPL</code> is a macro defined in <code>Makefile</code>, and specifies a specific implementation for the English dependency parser.
</p>
<p>
The source code for the Chinese dependency parser can be found at
</p>
<pre><code> zpar/src/common/depparser/implementation/CHINESE_DEPPARSER_IMPL</code></pre>
<p>
where <code>CHINESE_DEPPARSER_IMPL</code> is a macro defined in <code>Makefile</code>, and specifies a specific implementation for the Chinese dependency parser.
</p>

<h2><a id="reference">Reference</a></h2>
<ul>
<li>
Yue Zhang and Stephen Clark. 2008. A Tale of Two Parsers: Investigating and Combining Graph-based And transition-based Dependency Parsing Using Beam-search. In <em>Proc. of EMNLP</em>, pages 562-571.
</li>
<li>
Yue Zhang and Joakim Nivre. 2011. Transition-based Dependency Parsing with Rich Non-local Features. In <em>Proc. of ACL</em>, pages 188-193.
</li>
</ul>

</div>
</div>
<footer class="text-center">
<p>
ZPar Release 0.7
</p>
</footer>
</body>
</html>
